GPhase: Greedy Approach for Accurate Haplotype Inferencing
نویسندگان
چکیده
We consider the computational problem of phasing an individual genotype sample given a collection of known haplotypes in the population. We give a fast and accurate algorithm GPhase for reconstructing haplotype pair consistent with input genotype. It uses the coalescent based mutation model of Stephens and Donnelly (2000). Computing optimal solution under this model is expensive and our algorithm uses a greedy approximation for fast and accurate estimation. Our algorithm is simple, efficient and has linear time and space complexity. Experiments on real datasets revealed improved gene level phasing accuracy for GPhase tool compared to other widely used tools such as SHAPEIT, Beagle, MaCH and Impute2. On simulated data, GPhase tool was able to phase samples each containing more than 1700 markers with high accuracy. GPhase can be used for gene level phasing of individual samples using publicly available haplotype datasets such as HapMap data or 1000 genome data. This finds applications in studies on recessive Mendelian disorders where parent data is lacking. GPhase is freely available for download and use from https://github.com/kshitijtayal/GPhase/.
منابع مشابه
Haplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملHapCUT: an efficient and accurate algorithm for the haplotype assembly problem
MOTIVATION The goal of the haplotype assembly problem is to reconstruct the two haplotypes (chromosomes) for an individual using a mix of sequenced fragments from the two chromosomes. This problem has been shown to be computationally intractable for various optimization criteria. Polynomial time algorithms have been proposed for restricted versions of the problem. In this article, we consider t...
متن کاملStochastic local search for large-scale instances of the haplotype inference problem by pure parsimony
Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a com...
متن کاملStochastic local search for large-scale instances of the Haplotype Inference Problem by Parsimony
Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information allows researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a com...
متن کاملTwo-Level ACO for Haplotype Inference Under Pure Parsimony
Haplotype Inference is a challenging problem in bioinformatics that consists in inferring the basic genetic constitution of diploid organisms on the basis of their genotype. This information enables researchers to perform association studies for the genetic variants involved in diseases and the individual responses to therapeutic agents. A notable approach to the problem is to encode it as a co...
متن کامل